NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Refining Labeling Functions with Limited Labeled Data

https://doi.org/10.1145/3711896.3737102

Li, Chenjie; Gilad, Amir; Glavic, Boris; Miao, Zhengjie; Roy, Sudeepa (August 2025, ACM)

Free, publicly-accessible full text available August 3, 2026
Qr-Hint: Actionable Hints Towards Correcting Wrong SQL Queries

https://doi.org/10.1145/3654995

Hu, Yihao; Gilad, Amir; Stephens-Martinez, Kristin; Roy, Sudeepa; Yang, Jun (May 2024, Proceedings of the ACM on Management of Data)

We describe a system called Qr-Hint that, given a (correct) target query Q* and a (wrong) working query Q, both expressed in SQL, provides actionable hints for the user to fix the working query so that it becomes semantically equivalent to the target. It is particularly useful in an educational setting, where novices can receive help from Qr-Hint without requiring extensive personal tutoring. Since there are many different ways to write a correct query, we do not want to base our hints completely on how Q* is written; instead, starting with the user's own working query, Qr-Hint purposefully guides the user through a sequence of steps that provably lead to a correct query, which will be equivalent to Q* but may still look quite different from it. Ideally, we would like Qr-Hint's hints to lead to the smallest possible corrections to Q. However, optimality is not always achievable in this case due to some foundational hurdles such as the undecidability of SQL query equivalence and the complexity of logic minimization. Nonetheless, by carefully decomposing and formulating the problems and developing principled solutions, we are able to provide provably correct and locally optimal hints through Qr-Hint. We show the effectiveness of Qr-Hint through quality and performance experiments as well as a user study in an educational setting.
more » « less
Full Text Available
What Teaching Databases Taught Us about Researching Databases: Extended Talk Abstract

https://doi.org/10.1145/3663649.3664375

Yang, Jun; Gilad, Amir; Hu, Yihao; Meng, Hanze; Miao, Zhengjie; Roy, Sudeepa; Stephens-Martinez, Kristin (June 2024, ACM)

Full Text Available
Evaluating Pre-trial Programs Using Interpretable Machine Learning Matching Algorithms for Causal Inference

https://doi.org/10.1609/aaai.v38i20.30239

Seale-Carlisle, Travis; Jain, Saksham; Lee, Courtney; Levenson, Caroline; Ramprasad, Swathi; Garrett, Brandon; Roy, Sudeepa; Rudin, Cynthia; Volfovsky, Alexander (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

After a person is arrested and charged with a crime, they may be released on bail and required to participate in a community supervision program while awaiting trial. These 'pre-trial programs' are common throughout the United States, but very little research has demonstrated their effectiveness. Researchers have emphasized the need for more rigorous program evaluation methods, which we introduce in this article. We describe a program evaluation pipeline that uses recent interpretable machine learning techniques for observational causal inference, and demonstrate these techniques in a study of a pre-trial program in Durham, North Carolina. Our findings show no evidence that the program either significantly increased or decreased the probability of new criminal charges. If these findings replicate, the criminal-legal system needs to either improve pre-trial programs or consider alternatives to them. The simplest option is to release low-risk individuals back into the community without subjecting them to any restrictions or conditions. Another option is to assign individuals to pre-trial programs that incentivize pro-social behavior. We believe that the techniques introduced here can provide researchers the rigorous tools they need to evaluate these programs.
more » « less
Full Text Available
How Database Theory Helps Teach Relational Queries in Database Education (Invited Talk)

https://doi.org/10.4230/LIPICS.ICDT.2024.2

Roy, Sudeepa; Gilad, Amir; Hu, Yihao; Meng, Hanze; Miao, Zhengjie; Stephens-Martinez, Kristin; Yang, Jun (January 2024, Schloss Dagstuhl – Leibniz-Zentrum für Informatik)
Cormode, Graham; Shekelyan, Michael (Ed.)
Data analytics skills have become an indispensable part of any education that seeks to prepare its students for the modern workforce. Essential in this skill set is the ability to work with structured relational data. Relational queries are based on logic and may be declarative in nature, posing new challenges to novices and students. Manual teaching resources being limited and enrollment growing rapidly, automated tools that help students debug queries and explain errors are potential game-changers in database education. We present a suite of tools built on the foundations of database theory that has been used by over 1600 students in database classes at Duke University, showcasing a high-impact application of database theory in database education.
more » « less
Full Text Available
Explaining Differentially Private Query Results with DPXPlain

https://doi.org/10.14778/3611540.3611596

Wang, Tingyu; Tao, Yuchao; Gilad, Amir; Machanavajjhala, Ashwin; Roy, Sudeepa (August 2023, Proceedings of the VLDB Endowment)

Employing Differential Privacy (DP), the state-of-the-art privacy standard, to answer aggregate database queries poses new challenges for users to understand the trends and anomalies observed in the query results: Is the unexpected answer due to the data itself, or is it due to the extra noise that must be added to preserve DP? We propose to demonstrate DPXPlain, the first system for explaining group-by aggregate query answers with DP. DPXPlain allows users to compare values of two groups and receive a validity check, and further provides an explanation table with an interactive visualization, containing the approximately 'top-k' explanation predicates along with their relative influences and ranks in the form of confidence intervals, while guaranteeing DP in all steps.
more » « less
Full Text Available
Characterizing and Verifying Queries Via CINSGEN

https://doi.org/10.1145/3555041.3589721

Meng, Hanze; Miao, Zhengjie; Gilad, Amir; Roy, Sudeepa; Yang, Jun (June 2023, SIGMOD/PODS '23: International Conference on Management of Data)

Full Text Available
Causal What-If and How-To Analysis Using HypeR

https://doi.org/10.1109/ICDE55515.2023.00293

Shen, Fangzhu; Heravi, Kayvon; Gomez, Oscar; Galhotra, Sainyam; Gilad, Amir; Roy, Sudeepa; Salimi, Babak (April 2023, 2023 IEEE 39th International Conference on Data Engineering (ICDE))

Full Text Available
CaJaDE: explaining query results by augmenting provenance with context

https://doi.org/10.14778/3554821.3554852

Li, Chenjie; Lee, Juseung; Miao, Zhengjie; Glavic, Boris; Roy, Sudeepa (August 2022, Proceedings of the VLDB Endowment)

In this work, we demonstrate CaJaDE (Context-Aware Join-Augmented Deep Explanations), a system that explains query results by augmenting provenance with contextual information from other related tables in the database. Given two query results whose difference the user wants to understand, we enumerate possible ways of joining the provenance (i.e., contributing input tuples) of these two query results with tuples from other relevant tables in the database that were not used in the query. We use patterns to concisely explain the difference between the augmented provenance of the two query results. CaJaDE, through a comprehensive UI, enables the user to formulate questions and explore explanations interactively.
more » « less
Full Text Available
Understanding Queries by Conditional Instances

https://doi.org/10.1145/3514221.3517898

Gilad, Amir; Miao, Zhengjie; Roy, Sudeepa; Yang, Jun (June 2022, Proceedings of the 2022 International Conference on Management of Data)

Full Text Available

« Prev Next »

Search for: All records